PLOS Digital Health
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
BackgroundClinicians in care management programs are often in low supply relative to patient demand, especially in US Medicaid programs, and must simultaneously address clinical risk, time efficiency, and patients social needs. Many studies have shown that large language models may assist in their tasks for summarizing patient care, such as in generating care plans; yet these studies also show that different objectives given to agents often conflict and produce problems for safety, efficiency an...
Show abstract
BackgroundLarge Language Models (LLMs) show promise for clinical decision support in Intensive Care Units (ICU), but their safety and reliability remain inadequately evaluated through dual testing of both memory-dependent and memory-independent safety mechanisms. ObjectiveTo comprehensively evaluate LLMs using two independent safety tests: context-dependent contraindication memory (penicillin allergy recall) and context-independent authority resistance (Extended Milgram Test), revealing whether...
Show abstract
Digital health technologies are powerful-enhancing data collection, participant engagement, and personalized health interventions-yet their rapid proliferation has outpaced guidance for research participant protection. Current practice assists researchers in identifying risks but provides limited support for comprehensive risk management. To address this gap, we developed the Digital Health Checklist-Risk Management (DHC-RM) Tool, which integrates the established Digital Health Checklist with ap...
Show abstract
Clinical prediction models are often created using large routinely collected datasets. It is essential that prediction models are developed with appropriate data and methods and transparently reported to ensure that decisions are based on reliable predictions. Kaggle is a popular competition website where users learn and apply analysis skills on a range of datasets. We identified two large, publicly available Kaggle datasets, on stroke and diabetes, that lack clear data provenance, but are widel...
Show abstract
Structured AbstractO_ST_ABSObjectiveC_ST_ABSThe use of ambient AI documentation tools is rapidly growing in US hospitals and clinics. Such tools generate the first draft of clinical notes from scribed patient-provider conversations, which clinicians can then review and edit before signing into electronic health records (EHR). Understanding how and why clinicians make modifications to AI-generated drafts is critical to improving AI design and clinical efficiency, yet it has been under-studied. Th...
Show abstract
BackgroundDelivering timely, high-quality feedback on resident scholarly projects is labour-intensive, especially in large programmes. We developed an AI-assisted evaluation system, powered by the open-weight LLaMA-3.1 large-language model (LLM), to generate formative feedback on Family Medicine residents scholarly projects and compared its performance with expert human evaluators. MethodsWe evaluated whether the AI-generated feedback achieves comparable quality to expert feedback. The tool ing...
Show abstract
IntroductionHealthcare organizations have begun incorporating screening procedures for social determinants of health (SDOH) into care, recognizing the impact these factors can have on health outcomes. We aimed to present methods for evaluating redundancy in the risk information gained across SDOH questions and for evaluating whether demographic biases are present in whether patients were asked SDOH questions and whether they declined to answer them. MethodsSDOH question data were analyzed for 1...
Show abstract
BackgroundArtificial intelligence is increasingly embedded in healthcare delivery. Its legitimacy depends on institutional governance, not technical performance alone. Prior research has centered on clinicians and patients. Less attention has been given to cybersecurity professionals who sustain the digital infrastructures that support health AI. This study examines how cybersecurity professionals conceptualize AI as clinical infrastructure and how these interpretations shape understandings of t...
Show abstract
Cross-device medical federated learning--where individual patients participate directly rather than institutions--poses a unique challenge: each client holds only a few samples, often just one (e.g., a single diagnostic record), leaving insufficient local data for gradient computation. Existing approaches, such as Secure Aggregation, require client-to-client coordination impractical for intermittently available mobile devices, while homomorphic encryption introduces substantial computational ove...
Show abstract
BackgroundLarge language models (LLMs) are increasingly deployed in medical contexts as patient-facing assistants, providing medication information, symptom triage, and health guidance. Understanding their robustness to adversarial inputs is critical for patient safety, as even a single safety failure can lead to adverse outcomes including severe harm or death. ObjectiveTo systematically evaluate the safety guardrails of state-of-the-art LLMs through adversarial red-teaming specifically designe...
Show abstract
BackgroundTinnitus affects a substantial proportion of the global population and can severely disrupt sleep, mood, and daily functioning, yet the quality of mobile health apps designed for tinnitus management remains highly variable. Traditional evaluation methods, including clinical trials, expert rating scales, and small-scale surveys, rarely capture large-scale, feature-level feedback from real-world users, leaving a gap in understanding which app characteristics drive sustained engagement an...
Show abstract
BackgroundBreakthroughs in model architecture and the availability of data are driving transformational artificial intelligence in healthcare research at an exponential rate. The shift in use of model types can be attributed to multimodal properties of the Foundation Models, better reflecting the inherently diverse nature of clinical data and the advancing model implementation capabilities. Overall, the field is maturing from exploratory development towards application in real-world evaluation a...
Show abstract
Artificial intelligence (AI) is increasingly integrated into healthcare delivery, yet patient acceptance in resource constrained settings remains incompletely characterized. This study assessed attitudes toward AI supported care among patients attending hospitals in three Jordanian governorates (Amman, Balqa, Irbid) and examined demographic and digital literacy correlates of acceptance. In a cross sectional survey (n = 500 complete questionnaires), participants rated exposure to AI in healthcare...
Show abstract
PurposeTo evaluate whether large language models (LLMs) can enhance clinician-patient communication by simplifying radiology reports to improve patient readability and comprehension. MethodsA randomised controlled trial was conducted at a single healthcare service for patients undergoing X-ray, ultrasound or computed tomography between May 2025 and June 2025. Participants were randomised in a 1:1 ratio to receive either (1) the formal radiology report only or (2) the formal radiology report and...
Show abstract
BackgroundEHR documentation and chart review contribute to clinician workload and burnout. To alleviate pre-charting burden, Epic has released a new generative AI chart summarizer tool, which has become widely adopted; however, its impact has not been examined in randomized trials. ObjectiveTo evaluate whether access to an Epic generative AI chart summarization tool reduces cognitive task load among ambulatory providers compared with usual care. MethodsTwo-arm, parallel-group randomized contro...
Show abstract
Ambient intelligence-based systems are increasingly used for clinical documentation. To quantify linguistic differences associated with ambient documentation, we conducted a matched pre-post analysis of 6,026 outpatient clinical notes from Mass General Brigham following implementation of two ambient AI documentation systems (Nuance Dragon Ambient eXperience [DAX] and Abridge). Within-clinician comparisons focused on the History of Present Illness (HPI) and Assessment and Plan (A&P) sections and ...
Show abstract
BackgroundGenerative artificial intelligence (GenAI) in healthcare may reduce administrative burden and enhance quality of care. Large language models (LLMs) can generate draft responses to patient messages using electronic health record (EHR) data. This could mitigate increased workload related to high message volumes. While effectiveness and feasibility of these GenAI tools have been studied in the United States, evidence from non-English contexts is scarce, particularly regarding user experie...
Show abstract
Data scarcity and stylistic heterogeneity pose major challenges for emotion intensity classification. This paper presents a cross-dataset augmentation framework that leverages prompt-conditioned generative models alongside deterministic and heuristic transformations to synthesize target-style examples for improved transfer learning. We introduce a unified taxonomy of augmentation strategies--Heuristic Lexical Perturbation (HLA), Prompt-Conditioned Generative Augmentation (CGA), Sequential Hybrid...
Show abstract
Large language models (LLMs) are increasingly used by the public to seek health information, yet their reliability in addressing common vaccine myths remains unclear. We conducted an exploratory multi-vendor evaluation of three LLMs (GPT-5, Gemini 2.5 Flash, Claude Sonnet 4) using officially curated vaccination myths from Germanys public health institution and two realistic user framings as prompts: a curious skeptic and a convinced believer. All model responses were independently evaluated by t...
Show abstract
BackgroundClinical risk prediction models are typically evaluated by discrimination (area under the receiver operating characteristic curve, AUC), with calibration receiving less attention. We developed a multi-timeframe diabetes prediction framework emphasizing calibration and used synthetic data validation to investigate whether good discrimination guarantees good calibration. MethodsWe generated 500,000 synthetic patients using published epidemiological parameters from QDiabetes-2018, FINDRI...